【DAY25】BeautifulSoup

2023 iThome 鐵人賽

DAY 25

Modern Web

30天全端：關於網站實作的聊聊系列第 25 篇

15th鐵人賽 python beautifulsoup

Bonnie1226

團隊消波塊上的海洋貓貓

2023-10-10 22:43:14

352 瀏覽

分享至

今天繼續介紹一個跟另一個爬蟲的第三方套件 : BeautifulSoup

BeautifulSoup

處理回應，即處理html
- 利用BeautifulSoup()將回傳的html字串轉為BeautifulSoup物件
- 再利用find或select方法定位標籤
定位HTML
- 先從開發人員工具中找到標籤的位置
- 再利用find或select篩選特定的標籤
  - 用id定位較佳

安裝

`pip install bs4`

方法

利用find跟findAll找出所有標籤

find : 只會找出第一個標籤
findAll : 找出所有符合條件的標籤
- findAll(tag, attribute)

soup.find()

soup.find(name, attrs, recursive, text, **kwargs)

參數：
- name：要查找的標籤名稱。
- attrs：標籤的屬性，通常使用字典格式。
- recursive：是否遞歸搜索所有子標籤。默認為 True。
- text：搜索包含特定文本的標籤。

from bs4 import BeautifulSoup

html = '<div class="example">it30!!!</div><div class="example">day23!!!</div>'
soup = BeautifulSoup(html, 'html.parser')

# 查找第一個 class 為 "example" 的 div 標籤
div_tag = soup.find('div', {'class': 'example'})
print(div_tag.text)  # 輸出：it30!!!

soup.findAll()

soup.findAll(name, attrs, recursive, text, limit, **kwargs)

參數：
- name：要查找的標籤名稱。
- attrs：標籤的屬性，通常使用字典格式。
- recursive：是否遞歸搜索所有子標籤。默認為 True。
- text：搜索包含特定文本的標籤。
- limit：返回的標籤數量限制。默認為返回所有找到的標籤。

from bs4 import BeautifulSoup

html = '<div class="example">it30!!!</div><div class="example">day23!!!</div>'
soup = BeautifulSoup(html, 'html.parser')

# 查找所有 class 為 "example" 的 div 標籤
div_tags = soup.findAll('div', {'class': 'example'})

for tag in div_tags:
    print(tag.text)
# 輸出：
# it30!!!
# day23!!!

明天見！